(a) (b)
(a) The ROC curves of Jackknife CART and C5.0 models for the factor Xa
eavage data. Their AUC values were 0.856 and 0.916, respectively. (b) The
ogos generated for the factor Xa protease data using the ggseqlogo package.
panel is the sequence logo of the non-cleaved peptides and the lower panel is
ce logo of the cleaved peptides.
e random forest algorithm
ementioned decision tree algorithms employ the orthogonal
approach, i.e., each partitioning rule employs one variable, and
the value of the variable against a constant, such as x < T. Using
of partitioning strategy, a complex data space may need many
ng rules to generate a decision tree model. Figure 3.47(a) shows
ace in which there are two classes of data points. If the orthogonal
approach is used by employing three partitioning rules, Figure
hows a decision tree constructed for this data set. The first
ng rule and the third partitioning rule do not generate pure
s. The consequence is that only one subspace is pure for one class.
hown in Figure 3.47(b), where only one leaf node is coloured by
colour. For instance, the left subspace generated by the first
rule ݔ൏ݔ is composed of 17 data points of one class and one
t of the other class. It is no doubt that if all subspaces are required
e, the derived decision tree will be very complex.